Bridging the Gap between Naive Bayes and Maximum Entropy Text Classification

نویسندگان

  • Alfons Juan-Císcar
  • David Vilar
  • Hermann Ney
چکیده

Abstract. The naive Bayes and maximum entropy approaches to text classification are typically discussed as completely unrelated techniques. In this paper, however, we show that both approaches are simply two different ways of doing parameter estimation for a common log-linear model of class posteriors. In particular, we show how to map the solution given by maximum entropy into an optimal solution for naive Bayes according to the conditional maximum likelihood criterion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

A Survey Paper On Naive Bayes Classifier For Multi-Feature Based Text Mining

Text mining is variance of a field called data mining. To make unstructured data workable by the computer Text mining is used which is also referred as “Text Analytics”. Text categorization, also called as topic spotting is the task of automatically classifies a set of documents into groups from a predefined set. Text classification is an essential application and research topic because of incr...

متن کامل

Using Maximum Entropy for Text Classification

This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution estimation technique widely used for a variety of natural language tasks, such as language modeling, part-of-speech tagging, and text segmentation. The underlying principle of maximum entropy is that without external knowledge, one should prefer distributions that are...

متن کامل

Classifying Linux Shell Commands using Naive Bayes Sequence Model

Using Linux shell commands is a challenging task for most of the people new to Linux. This paper presents the idea of conversion of natural language to equivalent Linux shell command. To achieve the conversion we make use of a Naive Bayes text classifier. However there could be a case of a series of flags and combination of commands. This is handled by a sequence of Naive Bayes text classifier....

متن کامل

Classification of Text Documents Based on Minimum System Entropy

In this paper, we describe a new approach to classification of text documents based on the minimization of system entropy, i.e., the overall uncertainty associated with the joint distribution of words and labels in the collection. The classification algorithm assigns a class label to a new document in such a way that its insertion into the system results in the maximum decrease (or least increa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007